In this webpage, we will be exploring a mortality and poverty dataset and attempting to render a plotly graph through it. In order to render a graph, we will first need to install the plotly library
library(tidyverse)
library(dplyr)
library(plotly)
Once that has been installed, we now move on to data loading
This part is simple. We just import the data files, join them together, and clean them up to remove non-countries (haha, good trick Elias!)
# read in the data
mortality <- read_csv("mortality.csv")
poverty <- read_csv("poverty.csv")
# explore the data
head(mortality)
## # A tibble: 6 x 59
## country `1960` `1961` `1962` `1963` `1964` `1965` `1966` `1967` `1968` `1969`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghan~ 356. 350. 345. 339. 334. 328. 323 318. 312. 307.
## 2 Albania NA NA NA NA NA NA NA NA NA NA
## 3 Algeria 242. 242. 243. 244. 245 246. 247 247. 246. 244
## 4 Andorra NA NA NA NA NA NA NA NA NA NA
## 5 Angola NA NA NA NA NA NA NA NA NA NA
## 6 Antigu~ NA NA NA NA NA NA NA NA NA NA
## # ... with 48 more variables: `1970` <dbl>, `1971` <dbl>, `1972` <dbl>,
## # `1973` <dbl>, `1974` <dbl>, `1975` <dbl>, `1976` <dbl>, `1977` <dbl>,
## # `1978` <dbl>, `1979` <dbl>, `1980` <dbl>, `1981` <dbl>, `1982` <dbl>,
## # `1983` <dbl>, `1984` <dbl>, `1985` <dbl>, `1986` <dbl>, `1987` <dbl>,
## # `1988` <dbl>, `1989` <dbl>, `1990` <dbl>, `1991` <dbl>, `1992` <dbl>,
## # `1993` <dbl>, `1994` <dbl>, `1995` <dbl>, `1996` <dbl>, `1997` <dbl>,
## # `1998` <dbl>, `1999` <dbl>, `2000` <dbl>, `2001` <dbl>, `2002` <dbl>,
## # `2003` <dbl>, `2004` <dbl>, `2005` <dbl>, `2006` <dbl>, `2007` <dbl>,
## # `2008` <dbl>, `2009` <dbl>, `2010` <dbl>, `2011` <dbl>, `2012` <dbl>,
## # `2013` <dbl>, `2014` <dbl>, `2015` <dbl>, `2016` <dbl>, `2017` <dbl>
head(poverty)
## # A tibble: 6 x 35
## country `1996` `2002` `2005` `2008` `2012` `2000` `1986` `1987` `1991` `1992`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Albania 12.4 16.6 9.79 6.11 6.79 NA NA NA NA NA
## 2 Angola NA NA NA 54.5 NA 54.2 NA NA NA NA
## 3 Argent~ 8.98 25.4 11.4 6.79 3.69 12.1 0 2.22 3.91 4.48
## 4 Armenia 40.4 49.4 24.7 12.9 17.4 NA NA NA NA NA
## 5 Azerba~ NA 0.24 0 2.51 NA NA NA NA NA NA
## 6 Bangla~ NA NA 63.0 NA NA 70.1 NA NA 82.4 NA
## # ... with 24 more variables: `1993` <dbl>, `1994` <dbl>, `1995` <dbl>,
## # `1997` <dbl>, `1998` <dbl>, `1999` <dbl>, `2001` <dbl>, `2003` <dbl>,
## # `2004` <dbl>, `2006` <dbl>, `2007` <dbl>, `2009` <dbl>, `2010` <dbl>,
## # `2011` <dbl>, `2013` <dbl>, `2014` <dbl>, `1983` <dbl>, `1985` <dbl>,
## # `1988` <dbl>, `1990` <dbl>, `1981` <dbl>, `1982` <dbl>, `1984` <dbl>,
## # `1989` <dbl>
# create tidy datasets
mortality_tidy <- mortality %>%
pivot_longer(cols = !country, names_to = "year", values_to = "mrate")
poverty_tidy <- poverty %>%
pivot_longer(cols = !country, names_to = "year", values_to = "prate")
# joining datasets together
measurements <- inner_join(mortality_tidy, poverty_tidy, by = c("country", "year")) %>%
na.omit() %>% filter(!country %in% c("Europe & Central Asia", "East Asia & Pacific",
"Middle East & North Africa", "Sub-Saharan Africa", "Latin America & Caribbean",
"Low income", "Low & middle income",
"Lower middle income", "Lower middle income", "Middle income", "Upper middle income",
"Fragile and conflict affected situations", "IDA total", "IDA only", "IDA blend",
"IDA & IBRD total", "IBRD only"))
With the data, let’s quickly create our regression model
model <- lm(mrate ~ prate, measurements)
summary(model)
##
## Call:
## lm(formula = mrate ~ prate, data = measurements)
##
## Residuals:
## Min 1Q Median 3Q Max
## -79.878 -12.445 -1.050 7.557 180.451
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.32228 1.21747 5.193 2.46e-07 ***
## prate 1.33676 0.03063 43.643 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 28.27 on 1113 degrees of freedom
## Multiple R-squared: 0.6312, Adjusted R-squared: 0.6308
## F-statistic: 1905 on 1 and 1113 DF, p-value: < 2.2e-16
Now that we have the data and the model, all that’s left to do is to plot the figure (fingers crossed, let’s hope this works!!)
## No scatter mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## No trace type specified:
## Based on info supplied, a 'scatter' trace seems appropriate.
## Read more about this trace type -> https://plot.ly/r/reference/#scatter
And that’s how it works folks!